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Coapilation of a list of l^he aost co»aon phjase's used 
in .reading vas' begtin -with the rationale that the qoick reco9.Bitidn^f 
phrases' .aoo Id facilitate, reading coaprehension. Thes4' first effort 
shoved tlrat- categorizing phrases bj parts of speech di^not pr<^vide 
•acceptah^le levels of accaracy. The systea that was effective, 
hovever, used a coapoter prograa tha»t xecotded every consecotiae t'(K}- 
. and three-iford seqaence in th» text saapl^e and d^terainfd which of 
•^Ihese vprd strings recarred aost freqjientlj. The coapatet frbgraa 
Bakes possible saapUnge of large aaoants of text'50,000 vords or 
Bore-thas eliainating the idiosyncrasies o*f text saapling. The 
researchers who develop^ this systea believe that< the coaaon phrases 
it identifies should tfe ^taoght in aoch the saae aanner as co'B»on 
words are now taaght in beginning reading instrection. (BL) 
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The ability to quickly associate meaning with units of written 

•language is considered crucial to the comprehension of text (Smith, * 

« 

1971). Affiorig the units of written language the reader, must process 
are individual words, phrases, « clauses,- ^sentences and discourse struc- 
tares. Word' lists have been compiled for reading instruction with th^ 
criterion that the Wbrds be common, and that th^se connon words -be 
tav^ht ^riy in reading instruction; several such word lists -have 
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The authors gratefully acknowledge the help Lee Congdon who devel- 
* ' . ■> 

M • 

ope^ the coopuJfer program discussed in tjiis report* and Robert Hieb 
who made many trial runs in the process of the program developtoent . 



become popular and artf widely u^d by classroom teachers (Thorndike and 

Lorge, Dale and Chall, 1048; Carroll, Davies, Richman, 1971; 

Harris and Jaopbson, 1972)* List of conilon phrases (or comiaon word * 

strings), however, are not found in the research literature or in 

instructional mterials (basal 'rea'der, manuals, workbooks, etc . > even 

thbu^, it is- believed that the quick re<f:ognition of^phras^ yill facili- 

tate comprehension. The only available list of common phrases is »the 

Okie |6ompiled-.by Dolch (19^8) thirty years ago. 
^ ^. • 

The purpose of this report is to provide a raticinal'e-^r a 4 

justification — for the nfeed to identify conmon word string* in text. 

This justification touch upon some theories of language and/or 

reading processing and present o\ir implications for reading instruc- 

tion. In addition, we will d%3crrke some of the stages that brought 

us to the point where we felt we*could actually parse word strings 

from text vith a computer and identify the most common. 

. Both "word strings" and 'phrases" have been used at this pgint to 

ihdicat4 word groups where the words appear together in text . A more 

precise def initiocr^of in'ra^ser-teCfCe word groups sjch as pnrases, clauses 

' T 

and* strings is def erred ^"tintil a following section. • * 



Significance of the Problem , 

The more automatic the recognition of the chunks of* langtfege being 
rfead and the less effort expentJed on decoding, the greater the* likeli- 
hood of Complete* comprehension . LaBerge and Samuels (197^) and- Sanaaels- 
i1976) refer this as automatio decoding or automaticity, Samuels (1975) 

/states that "in order 16 have both fluent reading and gcbd comprehension, 

' ** * , ' ♦ 

the st^dent must go beyond accuracy to automaticity in decoding" 323) 



In ather words, the reader l^s a limited amount of cojipitive energy, - 
or ability', or memory with which to accomj^(sh the reading t^sk; the 
Bibre -cognitive energy usedr for decoding, the less for (Jomprehension . 
The' developpient- of .automatic ity probably^ begins at' th© word level; bi>t 
LaBergfe^nd Saouels state that if the reader 

^begins to organize some Qf th^ words into short groups or. phrases 
. as he reads, then ^Ur^her repetitions can strengthen these units 
> as. well as word units. In this way he can preak, through word-by- 

\ 

word* reading and apply the benefits of further repetitions tc 
automati2ati9n of larger .units . (p. 315). 

The iapoh^iance of "phras'e reading*" over "word readir;g^ is demon- 

strated by noting differenc^ in the fixation length of naive and"^ 

fluent readers. Pgr example, fxr^t-grade children may .make two fixa- 

tions per word whereas hign-school seniors maice one fixation for about 

every two words (Taylor, FracKer.pohl , and Patter,^ 196C). Arvj m a 

study of thihd- and sixth-gra^e readers, Rod^ (197^-75) fourfl that 

the eye-v6i«e span was longer for the o^der raiders* suggesting that 

the oldeh readej'fe attemjteo to decode the larger uoits of meaning . 

The work of Wisher ( 1976, 1977 ) pr*ovides further evidence that 

* • ; ' /I 

t^jfh reader yses his ^hderstarxiing c'f syntax *'to parse word striPiga / 

into convenient processir^^iunits" ('p. 601). .It is like^iy that under- 

standing' or semantic int^ration occurs between phras^ and clauses ' 

(more likely cla^uses, J^ut that discussioti ^is 'beyond the reaJ^c o*f ihis 

paper). Further support is provided by Fodor/aryj Bever 0.965) who 
• • • . 

^ found that listeners group, words (for understanding) according td 
' th^ syntax of the sentence. - I > * 

The ia^rtaggp of beV^g able to riad phrase^^s been discussed . 



•by a nuttber%)f reading educators irKluding .Bond and Tinker' 41975) , 
'Harris and Sipay (197 5). Heilman (1972), Heilman and Holaes. ( 1 972), . 

0 ' 
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aixd Zintz (1975); they believe that good readers organize the text ' 
they read into meaningful unit^ such ^s' phrases . However, many poor^ 
^readers do not do this and comprehension is poor ^even when they have 
been^^e-ta'ught each^ individual word in the selection (Oaken, Weiner, 
and Crcoer; 19X^1), 'and it hasj-k^een found that trainir^ ' in* the reading 
of phrases has iaproved the reading of remedial students (Amble, 196 '7)- 

Phrases, 'Claus'es, and Word.StPings , ' ^ • - 



' ' In our earliest efforts we were interested in* identifying ccanon, 
re<?ccurring phrases' such as prepositional phx^^s^.^ For reasons which 
will explained late'r', those efforts were unsuccessful so we resorted 
to identifying common "worl strir^s. At thi^ point^^a discussion of what 
is meant by phrases, clauses anc word strings .is appropriate. In the 
-sentence below there is a nojn phrase (Little children; fallowed by 
yerb phr^ase (w^e playir.g) which' is followed by 4 prepositional phi^age * 
{ Tr* the park) . v 

Little cniicren were playing m tne p^rk. 

, - ■ " _ . 

The noun .phrase ''and the verb pnrase (Little* children were playing) forz,^ 



While transfonoati^al granmar theory does npt provide for the cate-* 
gor^zation of phrases according to parts of speech, we found the tra- 
ditional labels useful ^ an transf crrriational graiaraar, a sentence"^ may 
be divided into a noun phrase and a' verb phrase. Additional information 

in this area may be r?>und in^DeStefano (1978) and Jacobs ind Rosencsa^oa 

■ * 

.(1968). 
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a wsSiisi or- independent clause.* Any^group of consecutive word^ CLittie ^ 
children, Little, children were» children wer^e playing, >jere playing m, 
^playing in the, in the park, and so on) constiUites a word string. 

We were — ^nd are — priIaa^il^' i-^.te rested it. c oncer, were grcups 
which we expected to be true' phrases according to a traditional graanar- 
ian's definition. However, a more appropriate desoAptcr fdr the word 
groupe we identifi^ is the ters "wore stcm^ (or, wojrd strings). 

« * * * 

E^rly Efforts at Parsing , Phrases ' . ' v 

• ' Because We wanied- tc able to anaiyie large amounts of texts. 

w V 

(initially we felt at least \^,2ZZ words}, the application of coajputer 

tech^iplogy was a critical part .'of oar wock. The fact that certain 

« * 

kinda of anailyses 'say-ce ^ccoc^plished thrpu^ tne use of computers 

has been ^lionsxratec ''Kucera and F rar*eis, - 1967; Carroll, Davies and 

Hichmafc, 1971; Harris ana Jacobson, i 972. Hoe,.19'^3; and Hopkins a»d 

Koe, 1975). However, tnis study recuirec prograasiir^g of a sofcewnat 

different nat«^e. 

We identified five types of pnrases (prepositioral, participai, 

gerund, infinitive and vero) w.^icn co;snonly appear i,n written materials^ 

Since It was anticipated that prepositional phrases cojld be identified 

by the computer witn a hrgh degree of accuracy we worked with a cpa- 

-puter programoer to develop such a program. By pr-ograacing the com- 

♦ 

puter ;to locate all prepositions' (with a list of 52 prepositions stored 
in the computer's memcr>0 an^id 'then parse out the preposition and the 
two^word st/ing which follpwed it, we fourld that indeed it was possible 
for the conputer to identify *^nese three-word strings with 99% accuracy. 
That' is, we only missed a bqut-'H of the prepositional phr^s. 
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proble* arose, -however, in that -even though these tnree-word strings 
began Witti a preposition, they di<l oot all function as prepositional 
phrases. We then eliminated prepositions grocs th|^list which rarely 
se^ed to -function -as J:he first woit^d-'ir, prepositional phrases. After 
aany program revisions a'nrf trial runs we were able to parse o^jt almost ♦ 
ail -i97-99^^) of the prepositi^al phrases >it of the text. However, 
we were still parsir^g out sar.y word strings which *were not prepositional 
phrases. Arid whep we examined trie strings which had t>een parked <xit, 
only at>out 62% were actual^repps;.tior^l phrases; we found .this-level 
of accuracy to'i>e unacceptable . ' 

Later Efforts ■ . ' 

We th^n ^e^ided to approa^cr. tr.e proclsn of identifying cosncon' 
phrases frc^ a completely r.ew perspective. Rather thar. categorizing 
phrases by parts ^ speecr^ another j2or:puter pro-am was developec 
which identified every consecutive two- and three-worJl sequence* foarid 
in the written t>2xt, store it m seoory, arid, at the end of all text 
input, '^tabjlate a^l possible two- and tnree-word strings. 

ThJfough Bucn trial and error, the investig^tor s wer^ able to 
develop a prograc that parsed oJt corxrcn word stririgs which are', oy 
tradttior^al definitions, actual prirases £r which- are tne fir^ two or 
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three words of an act-jal pr.rase. Some of the coHancn word strings iden- 
tified, however, cannot be oategorized by traditional definitions land 
are, ^therefore, sinipiy referred to as co*Dnion stri ps) . Cnce the new 
program was operational a corpus of 15,0C0 words analyzed previously 
with* the old prc^raa, was reanalyzed. This analysis led us to decide 
ttet if we were going to sake clams ^that we had identified cotnoon word 



: > 

-strings in written text, that aany, inany sore samples of written text 
nee<le^ to be analyzed, In order to elimii^ate the. idles yncrSsies of 
text sampling we belj^eve that- large amounts of text — over 50,000 words — 
.should be used in subsequent analyses.^ 

Major Implications . - ' 

There appears to be little disa"greeTDent t^t tSb more able readers- 
process larger chunks^ of text n>ore rapidly than the less able t^^^^tz, * 
And it ts agreed, we think, that olir instructional practices snonld be 
* such that our students are led to tne point whece they nay, with a ♦ • 
single fixation, read whole pnases o^f two or three or four words - As 
to hew children sn^uld be bro^-i^ht to tnis pbint, however, may be a 

/ 

debatable issue amon^^readinx e:jucatc;rs. We believe %hat coomon 
phrases should -be ta'jgr.t m rxicr, tne sarie manner, in which cosmon words 
are taught and a s-oggesteci pn^edure is presented here. 

If we know that "in tne" and "of the", for example, are consnon 
'word strings in t^t tr.en it seeris reasons o]fe that they be taught as 
' a group wit^* a nour, fojnd ir. tne text tne students are to read. Since 
"in", "tne^", and "of" becoz^e part of a reader's s-ight vocabulary, very 
early they^ will already oe familiar to tne student. Tne task is to ^ej^. 
the student -to read the function word(s) and the -content word, which 
may or may .not be a part of tne student's s i^ht vocabulary, quickly. 

Assume, for example, tnat the student nas a signt vocaoulary ofj 
approximately 1CCr words ar;c th^ tne words ''street'' and "pond'' are to 
be introduced as new words m a lesson. The concepts or the ifeariihg 
of "street" and "pond" will be discussed with the student by the 
teacher. Then the teacner will present the printed form of th^ word 
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'(•ither in isolation or in content');:^.. . iT the student is to become a ^ 
rapid read^ — and a rai:Sl^^omprehender\^ text — then the* reader should 
be ableXp read the phrases "Tn the ^ street ^*-j^d ""in the pond" quickly 
sinc^'i the meaciing of the phrase is not* m the Wc^N^^ ''in" or -in the word 



"^Xjtie^ but primarily in the word *'street'' and jmore'cof^etely in the 
phrases itself. A similar case may De roade for the presel^tion of 



^larger chunks such clauses and the procedures would ;>e rauch^the ^ 

/ 

sane. . | , 

Our purpose '<<as to cfe^/elop a syst^ to identify comson word strings- 

Since students nsast go Deyor.a the word level in beginnifig reading, we 

believe that the use of consnon word strings found in text wifl fac-ili- 

tate the reader's ability to handl^ larger anc^^Lrger units of text. 

^^^^^ y 
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